Goto

Collaborating Authors

 physical interaction


Robot-mediated physical Human-Human Interaction in Neurorehabilitation: a position paper

Vianello, Lorenzo, Short, Matthew, Manczurowsky, Julia, Küçüktabak, Emek Barış, Di Tommaso, Francesco, Noccaro, Alessia, Bandini, Laura, Clark, Shoshana, Fiorenza, Alaina, Lunardini, Francesca, Canton, Alberto, Gandolla, Marta, Pedrocchi, Alessandra L. G., Ambrosini, Emilia, Murie-Fernandez, Manuel, Roman, Carmen B., Tornero, Jesus, Leon, Natacha, Sawers, Andrew, Patton, Jim, Formica, Domenico, Tagliamonte, Nevio Luigi, Rauter, Georg, Baur, Kilian, Just, Fabian, Hasson, Christopher J., Novak, Vesna D., Pons, Jose L.

arXiv.org Artificial Intelligence

Neurorehabilitation conventionally relies on the interaction between a patient and a physical therapist. Robotic systems can improve and enrich the physical feedback provided to patients after neurological injury, but they under-utilize the adaptability and clinical expertise of trained therapists. In this position paper, we advocate for a novel approach that integrates the therapist's clinical expertise and nuanced decision-making with the strength, accuracy, and repeatability of robotics: Robot-mediated physical Human-Human Interaction. This framework, which enables two individuals to physically interact through robotic devices, has been studied across diverse research groups and has recently emerged as a promising link between conventional manual therapy and rehabilitation robotics, harmonizing the strengths of both approaches. This paper presents the rationale of a multidisciplinary team-including engineers, doctors, and physical therapists-for conducting research that utilizes: a unified taxonomy to describe robot-mediated rehabilitation, a framework of interaction based on social psychology, and a technological approach that makes robotic systems seamless facilitators of natural human-human interaction.


A Taxonomy of Pix Fraud in Brazil: Attack Methodologies, AI-Driven Amplification, and Defensive Strategies

Pizzolato, Glener Lanes, Lopes, Brenda Medeiros, Schepke, Claudio, Kreutz, Diego

arXiv.org Artificial Intelligence

This work presents a review of attack methodologies targeting Pix, the instant payment system launched by the Central Bank of Brazil in 2020. The study aims to identify and classify the main types of fraud affecting users and financial institutions, highlighting the evolution and increasing sophistication of these techniques. The methodology combines a structured literature review with exploratory interviews conducted with professionals from the banking sector. The results show that fraud schemes have evolved from purely social engineering approaches to hybrid strategies that integrate human manipulation with technical exploitation. The study concludes that security measures must advance at the same pace as the growing complexity of attack methodologies, with particular emphasis on adaptive defenses and continuous user awareness.


Unsupervised Learning for Physical Interaction through Video Prediction

Neural Information Processing Systems

A core challenge for an agent learning to interact with the world is to predict how its actions affect objects in its environment. Many existing methods for learning the dynamics of physical interactions require labeled object information. However, to scale real-world interaction learning to a variety of scenes and objects, acquiring labeled data becomes increasingly impractical. To learn about physical object motion without labels, we develop an action-conditioned video prediction model that explicitly models pixel motion, by predicting a distribution over pixel motion from previous frames. Because our model explicitly predicts motion, it is partially invariant to object appearance, enabling it to generalize to previously unseen objects. To explore video prediction for real-world interactive agents, we also introduce a dataset of 59,000 robot interactions involving pushing motions, including a test set with novel objects. In this dataset, accurate prediction of videos conditioned on the robot's future actions amounts to learning a visual imagination of different futures based on different courses of action. Our experiments show that our proposed method produces more accurate video predictions both quantitatively and qualitatively, when compared to prior methods.



Towards High-Consistency Embodied World Model with Multi-View Trajectory Videos

Su, Taiyi, Zhu, Jian, Li, Yaxuan, Ma, Chong, Huang, Zitai, Wang, Hanli, Xu, Yi

arXiv.org Artificial Intelligence

Embodied world models aim to predict and interact with the physical world through visual observations and actions. However, existing models struggle to accurately translate low-level actions (e.g., joint positions) into precise robotic movements in predicted frames, leading to inconsistencies with real-world physical interactions. To address these limitations, we propose MTV-World, an embodied world model that introduces Multi-view Trajectory-Video control for precise visuomotor prediction. Specifically, instead of directly using low-level actions for control, we employ trajectory videos obtained through camera intrinsic and extrinsic parameters and Cartesian-space transformation as control signals. However, projecting 3D raw actions onto 2D images inevitably causes a loss of spatial information, making a single view insufficient for accurate interaction modeling. To overcome this, we introduce a multi-view framework that compensates for spatial information loss and ensures high-consistency with physical world. MTV-World forecasts future frames based on multi-view trajectory videos as input and conditioning on an initial frame per view. Furthermore, to systematically evaluate both robotic motion precision and object interaction accuracy, we develop an auto-evaluation pipeline leveraging multimodal large models and referring video object segmentation models. To measure spatial consistency, we formulate it as an object location matching problem and adopt the Jaccard Index as the evaluation metric. Extensive experiments demonstrate that MTV-World achieves precise control execution and accurate physical interaction modeling in complex dual-arm scenarios.


Cormorant: Covariant Molecular Neural Networks

Brandon Anderson, Truong Son Hy, Risi Kondor

Neural Information Processing Systems

We propose Cormorant, a rotationally covariant neural network architecture for learning the behavior and properties of complex many-body physical systems. We apply these networks to molecular systems with two goals: learning atomic potential energy surfaces for use in Molecular Dynamics simulations, and learning ground state properties of molecules calculated by Density Functional Theory. Some of the key features of our network are that (a) each neuron explicitly corresponds to a subset of atoms; (b) the activation of each neuron is covariant to rotations, ensuring that overall the network is fully rotationally invariant. Furthermore, the non-linearity in our network is based upon tensor products and the Clebsch-Gordan decomposition, allowing the network to operate entirely in Fourier space. Cormorant significantly outperforms competing algorithms in learning molecular Potential Energy Surfaces from conformational geometries in the MD-17 dataset, and is competitive with other methods at learning geometric, energetic, electronic, and thermodynamic properties of molecules on the GDB-9 dataset.


Towards Proprioceptive Terrain Mapping with Quadruped Robots for Exploration in Planetary Permanently Shadowed Regions

Sanchez-Delgado, Alberto, Soares, João Carlos Virgolino, Barasuol, Victor, Semini, Claudio

arXiv.org Artificial Intelligence

Abstract-- Permanently Shadowed Regions (PSRs) near the lunar poles are of interest for future exploration due to their potential to contain water ice and preserve geological records. Their complex, uneven terrain favors the use of legged robots, which can traverse challenging surfaces while collecting in-situ data, and have proven effective in Earth analogs, including dark caves, when equipped with onboard lighting. While exteroceptive sensors like cameras and lidars can capture terrain geometry and even semantic information, they cannot quantify its physical interaction with the robot--a capability provided by proprioceptive sensing. We propose a terrain mapping framework for quadruped robots, which estimates elevation, foot slippage, energy cost, and stability margins from internal sensing during locomotion. These metrics are incrementally integrated into a multi-layer 2.5D gridmap that reflects terrain interaction from the robot's perspective. The system is evaluated in a simulator that mimics a lunar environment, using the 21 kg quadruped robot Aliengo, showing consistent mapping performance under lunar gravity and terrain conditions. The global interest in lunar exploration has brought particular focus to the Moon's Permanently Shadowed Regions (PSRs), primarily located near the poles. These regions are of significant scientific and strategic interest due to the potential presence of water ice, a critical resource for long-duration missions, and their capacity to preserve geological records [1], [2], [3].


Cormorant: Covariant Molecular Neural Networks

Brandon Anderson, Truong Son Hy, Risi Kondor

Neural Information Processing Systems

We propose Cormorant, a rotationally covariant neural network architecture for learning the behavior and properties of complex many-body physical systems. We apply these networks to molecular systems with two goals: learning atomic potential energy surfaces for use in Molecular Dynamics simulations, and learning ground state properties of molecules calculated by Density Functional Theory. Some of the key features of our network are that (a) each neuron explicitly corresponds to a subset of atoms; (b) the activation of each neuron is covariant to rotations, ensuring that overall the network is fully rotationally invariant. Furthermore, the non-linearity in our network is based upon tensor products and the Clebsch-Gordan decomposition, allowing the network to operate entirely in Fourier space. Cormorant significantly outperforms competing algorithms in learning molecular Potential Energy Surfaces from conformational geometries in the MD-17 dataset, and is competitive with other methods at learning geometric, energetic, electronic, and thermodynamic properties of molecules on the GDB-9 dataset.


A Reliable Robot Motion Planner in Complex Real-world Environments via Action Imagination

Wang, Chengjin, Zhou, Yanmin, Wang, Zhipeng, Yan, Zheng, Luan, Feng, Jiang, Shuo, Shen, Runjie, Sang, Hongrui, He, Bin

arXiv.org Artificial Intelligence

Humans and animals can make real - time adjustments to movements by imagining their action outcomes to prevent unanticipated o r even catastrophic motion failures in unknown unstructured environments. Action imagination, as a refined sensorimotor strategy, leverages perception - action loops to handle physical interaction -induced uncer tainties in perception and system modeling within complex systems. Inspired by the action -awareness capability of animal intelligence, this study proposes a n imagination - inspired motion planner (I -MP) framework that speci fically enhances robots' action reliability by imagining plausible spatial states for approaching . After topologizing the workspace, I -MP build perception-action loop enabling robots autonomously build contact models. Leveraging fixed-point theory and Hausdorff distance, the planner compute s convergent spatial states under interaction characteristics and mission constraints. By homogenously representing multi - dimensional environmental characteristics through work, the robot can approach the imagined spatial states via real - time computation o f energy gradients. Consequently, e xperimental results demonstrate the practicality and robustness of IMP in complex cluttered environments.


On the causality between affective impact and coordinated human-robot reactions

Frederiksen, Morten Roed, Støy, Kasper

arXiv.org Artificial Intelligence

In an effort to improve how robots function in social contexts, this paper investigates if a robot that actively shares a reaction to an event with a human alters how the human perceives the robot's affective impact. To verify this, we created two different test setups. One to highlight and isolate the reaction element of affective robot expressions, and one to investigate the effects of applying specific timing delays to a robot reacting to a physical encounter with a human. The first test was conducted with two different groups (n=84) of human observers, a test group and a control group both interacting with the robot. The second test was performed with 110 participants using increasingly longer reaction delays for the robot with every ten participants. The results show a statistically significant change (p$<$.05) in perceived affective impact for the robots when they react to an event shared with a human observer rather than reacting at random. The result also shows for shared physical interaction, the near-human reaction times from the robot are most appropriate for the scenario. The paper concludes that a delay time around 200ms may render the biggest impact on human observers for small-sized non-humanoid robots. It further concludes that a slightly shorter reaction time around 100ms is most effective when the goal is to make the human observers feel they made the biggest impact on the robot.